Method for screening genes expressing at desired sites

ABSTRACT

The present invention relates to a method for inferring a plant organ, in which a certain gene is to be expressed, using a part of a base sequence, a method for searching for a gene which is to be expressed at a desired site, and a composition, kit, system and program for carrying out these methods. The present invention also relates to a method for inferring a plant organ, in which a plant gene is to be expressed, based on information about the presence or absence of a base sequence which is highly similar to a transposable element in the vicinity of a protein coding region of a plant gene.

This application claims the benefit of PCT International ApplicationSer. No. PCT/JP01/10195 filed Nov. 21, 2001, which is incorporatedherein by reference in its entirety.

TECHNICAL FIELD

The present invention relates to a method for inferring a plant organ,in which a certain gene is to be expressed, using a part of a basesequence; a method for searching for a gene which is to be expressed ata desired site (e.g., a site containing a flower); a composition, kit,system and program for carrying out these methods; and products obtainedby these methods (nucleic acid molecule, etc.). More preferably, thepresent invention relates to a method for inferring a plant organ, inwhich a plant gene is to be expressed, based on information about thepresence or absence of a the base sequence which is highly similar to atransposable element (e.g., transposon) in the vicinity of a proteincoding region; a composition, kit, system, and program for carrying outthe method; and products obtained by these methods.

BACKGROUND ART

Recent progress in genome base sequence analysis technologies hasprovided a rapid improvement in analysis speed and a reduction inanalysis cost, whereby analysis of the structure of genomes of variousorganisms is proceeding at dramatic speed. Current methods for inferringa function of a gene from its deciphered base sequence depend on thepresence or absence of a similar sequence found by searching sequencedata of DNA or protein registered in an international database, such asGenBank or DDBJ, using, for example, PSI-BLAST algorithm (Altschul etal., Nucleic Acids Res. 25:3389-3402 (1997)) or the like. In thesemethods, the similarity between sequences and functions is estimatedfrom the similarity between known nucleic acids or proteins andsequences. In order to estimate the amino acid sequence of a proteincoded by a base sequence, various programs aimed at gene modeling forpredicting the positions of exons or introns from the genome basesequence have been developed. An attempt to realize full automation isproceeding, however accurate gene modeling essentially requires manualediting of a result of prediction, though accuracy problems remainunsolved.

Some databases and programs for predicting gene expression sites basedon a genome base sequence have been developed (Ghosh, Nucleic AcidsRes., 21:3117-3118 (1993); Ghosh, Nucleic Acids Res., 26:360-361 (1998);Heinemeyer et al., Nucleic Acids Res., 26:362-367 (1998)), thoughaccuracy problems remain unsolved.

For plants, a database compiling about 400cis element motifs in anexpression control region for a plant gene, to which transcriptionregulatory elements bind has constructed (Higo et al., Nucleic AcidsRes. 26:358-359 (1998); Nucleic Acids Res. 27:297-300 (1999)). Whenanalysis is carried out using a base sequence inferred as a promoter asa query, each cis element motif present in the base sequence isdisplayed. However, although there is a possibility that these functionas cis elements, no evidence exists that these actually function as ciselements. Therefore, there is a demand for the development of a methodfor inferring a gene expression site (expression tissue/expressionorgan) using a genome base sequence.

Clarification of gene expression sites would help reveal functions ofindividual genes and could make it possible to isolate and utilize apromoter portion. In the field of plants, development of tissue-specificpromoters would make possible gene expression specific to individualtissues using transformation technologies or inhibition of geneexpression. For example, if an anther-specific promoter were developed,the following applications would be expected.

It has been known that a F1 hybrid (first filial generation) generatedby crossing between varieties may have a more excellent property thanthat of its parents. This inter-variety crossing has conventionallyattracted attention as a method for breeding crops. For crops, such asrice, which perform self-pollination, methods for producing a malesterility strain have been studied as a technology required forutilization of such a property. Conventionally, male sterility strainshave been searched for among plant gene resources, or mutagenesis hasbeen used for selection of a male sterility strain. However, thesemethods have difficulty in introducing a male sterility gene into acommercial variety and their use is limited.

A recent promising approach is a method of utilizing biotechnology tolink a promoter, which expresses in an anther and/or pollen, with a genehaving a function to inhibit formation of an anther and/or pollen (e.g.,nuclease, protease, and glucanase) and introduce the linked genes into aplant so as to prevent formation of fertile pollen. An alternativepromising approach is a method of using a promoter, which is to beexpressed in an anther and/or pollen, so as to transcribe antisense RNAfor a gene which is to be expressed upon formation of an anther and/orpollen, or a method of introducing ribozyme, which decomposes mRNA forthe gene, into a plant.

There are several known promoters for genes which are expressed in ananther and/or pollen. However, unfortunately, the activities of thepromoters are too low for practical use, or the expression time thereofis limited. It would be very useful to isolate a promoter whichfunctions at each developmental stage of an anther or pollen, clarifyfeatures of each promoter, and produce a promoter cassette having a highactivity so as to artificially control formation of an anther and/orpollen.

Therefore, for example, if a promoter, which has a high activity, may bepractically used, and is directed to a desired site (e.g., an anther orpollen), can be obtained from a gene of rice, such a promoter cancontribute much to breeding of crops, such as rice. Further, in order tomodify a component of each tissue of flower, such as a protein involvedin adhesion of a petal pigment or pollen to a pistil, it is necessary toobtain a gene which is to be expressed in a flower.

To this end, required is a method for efficiently searching a DNAdatabase, in which a vast number of genome base sequences are stored,for a gene which is to be expressed in a flower, or a method forefficiently screening a genome DNA library for a gene which is to beexpressed in a desired site (e.g., flower).

DISCLOSURE OF THE INVENTION

To achieve the above-described objects, the present invention providesthe following.

The present invention provides the following.

1. A method for detecting a gene which is to be expressed at a desiredsite in a plant, comprising the step of:

(1) searching a gene population using a transposon sequence as a keysequence.

2. A method according to item 1, further comprising the step of:

(2) selecting a gene having similarity to the transposon sequence in thevicinity of a putative protein coding region.

3. A method according to item 1, wherein the transposon sequence is aMITE sequence.

4. A method according to item 1, wherein the desired site is a sitecontaining a flower.

5. A method according to item 1, wherein the site containing a flower isa flower.

6. A method according to item 1, wherein the desired site contains atleast one site selected from a stamen and a pistil.

7. A method according to item 1, wherein the plant is monocotyledon.

8. A method according to item 1, wherein the plant is rice.

9. A method according to item 1, wherein the transposon sequence is aTourist sequence.

10. A method according to item 1, wherein the transposon sequencecontains at least about 10 contiguous nucleotides in a sequenceindicated by SEQ ID NO: 1.

11. A method according to item 1, wherein the transposon sequencecontains at least about 15 contiguous nucleotides in the sequenceindicated by SEQ ID NO: 1.

12. A method according to item 1, wherein the transposon sequencecontains at least about 20 contiguous nucleotides in the sequenceindicated by SEQ ID NO: 1.

13. A method according to item 1, wherein the transposon sequencecontains at least about 50 contiguous nucleotides in the sequenceindicated by SEQ ID NO: 1.

14. A method according to item 1, wherein the transposon sequencecontains a sequence having at least about 70% homology to the sequenceindicated by SEQ ID NO: 1.

15. A method according to item 1, wherein the transposon sequencecontains a sequence having at least about 80% homology to the sequenceindicated by SEQ ID NO: 1.

16. A method according to item 1, wherein the transposon sequencecontains a sequence having at least about 90% homology to the sequenceindicated by SEQ ID NO: 1.

17. A method according to item 1, wherein the transposon sequence has atleast one substitution, addition or deletion in the sequence indicatedby SEQ ID NO: 1.

18. A method according to item 1, wherein the transposon sequence issubstantially the same as the sequence indicated by SEQ ID NO: 1.

19. A method according to item 1, wherein the transposon sequencecontains a sequence having at least about 70% homology to the sequenceindicated by SEQ ID NO: 2.

20. A method according to item 1, wherein the transposon sequencecontains a sequence having at least about 80% homology to the sequenceindicated by SEQ ID NO: 2.

21. A method according to item 1, wherein the transposon sequencecontains a sequence having at least about 90% homology to the sequenceindicated by SEQ ID NO: 2.

22. A method according to item 1, wherein the transposon sequence has atleast one substitution, addition or deletion in the sequence indicatedby SEQ ID NO: 2.

23. A method according to item 1, wherein the transposon sequence issubstantially the same as the sequence indicated by SEQ ID NO: 2.

24. A method according to item 1, wherein the gene population is adatabase and the key sequence is a query sequence.

25. A method according to item 24, wherein the database is a DNAdatabase.

26. A method according to item 24, wherein the search is carried out bya search method selected from the group consisting of BLAST, FASTA,Smith and Waterman method, and Needleman and Wunsch method.

27. A method according to item 1, wherein the gene population is alibrary and the key sequence is a probe sequence.

28. A method according to item 27, wherein the database is a DNAlibrary.

29. A method according to item 27, wherein the search is carried out bya search method selected from the group consisting of stringenthybridization, microarray assay, PCR, and in situ hybridization.

30. A method according to item 2, wherein the vicinity of the putativeprotein coding region is within about 2 kbp upstream of a translationinitiation codon, within about 1.1 kbp downstream of a translationtermination codon, and within an intron.

31. A method according to item 2, wherein the similarity is at leastabout 66% homology.

32. A method according to item 2, wherein the similarity is about 70%.

33. A method according to item 2, wherein the similarity is about 80%.

34. A composition for detecting a gene which is to be expressed at asite containing a flower, comprising a plasmid containing at least about10 contiguous nucleotides in the sequence indicated by SEQ ID NO: 1.

35. A kit for detecting a gene which is to be expressed at a desiredsite in a plant, comprising:

(1) a plasmid containing at least about 10 contiguous nucleotides in thesequence indicated by SEQ ID NO: 1; and

(2) a DNA library.

36. A method for producing a gene which is to be expressed at a desiredsite in a plant, comprising the steps of:

(1) searching a gene population using a transposon sequence;

(2) selecting a gene having similarity to the transposon sequence in aputative protein coding region; and

(3) producing a nucleic acid molecule coding the gene.

37. A method according to item 36, wherein the production is carried outin vitro or in vivo.

38. A nucleic acid molecule coding a gene which is to be expressed at adesired site in a plant, wherein a base sequence of the nucleic acidmolecule is obtained by a method comprising the step of:

(1) searching a gene population using a transposon sequence as a keysequence.

39. A recording medium storing a program for allowing a computer toexecute automatic computation for detecting a gene which is to beexpressed at a desired site in a plant, the automatic computationcomprises the steps of:

(1) providing a transposon sequence as a query sequence;

(2) providing a database;

(3) searching the database using the query sequence; and

(4) outputting a result of the search.

40. A program for allowing a computer to execute automatic computationfor detecting a gene which is to be expressed at a desired site in aplant, the automatic computation comprising the steps of:

(1) providing a transposon sequence (e.g., a MITE sequence, such as aTourist sequence) as a query sequence;

(2) providing a database;

(3) searching the database using the query sequence; and

(4) outputting a result of the search.

41. A system for detecting a gene which is to be expressed at a desiredsite in a plant, the system comprising:

(A) a computer; and

(B) a program for allowing a computer to execute automatic computationfor detecting the gene which is to be expressed at the desired site inthe plant,

wherein the automatic computation comprises the steps of:

(1) providing a transposon sequence as a query sequence;

(2) providing a database;

(3) searching the database using the query sequence; and

(4) outputting a result of the search.

42. A system according to item 41, wherein the computer is linked to anetwork.

43. A method for inferring an organ of a plant in which a gene is to beexpressed, comprising the step of:

(1) obtaining information about whether or not abase sequence similar tothe sequence of a transposable element is present in the vicinity of thegene, and when the similar sequence is present in the vicinity of thegene, inferring that the gene is to be expressed in the plant organrelating to the transposable element sequence.

44. A method according to item 43, wherein the plant organ relating tothe transposable element sequence is a site containing a flower.

45. A method according to item 44, wherein the site containing a flowercontains a site selected from the group consisting of a stamen and apistil.

46. A method according to item 43, wherein the sequence similar to thetransposable element sequence is a MITE sequence.

47. A method according to item 43, wherein the sequence similar to thetransposable element sequence is a Tourist sequence.

48. A method according to item 43, wherein the plant includes rice.

49. A nucleic acid molecule coding a gene obtained by a method accordingto item 43.

50. A recording medium storing a sequence coding a gene obtained by amethod according to item 43.

51. A method for modifying an expression pattern of a gene of a plant,comprising the step of utilizing the sequence of a gene obtained by amethod according to item 43.

52. A kit for inferring a plant organ in which a gene is to beexpressed, comprising:

(1) a molecule having a transposable element sequence.

53. A kit for inferring a plant organ in which a gene is to beexpressed, comprising:

(1) a recording medium storing a transposable element sequence.

54. A recording medium storing a program for allowing a computer toexecute automatic computation for inferring a plant organ in which agene is to be expressed, the automatic computation comprising the stepsof:

(1) providing a transposable element sequence as a query sequence;

(2) providing the sequence of the gene;

(3) comparing the query sequence with the sequence of the gene; and

(4) outputting a result of the comparison.

55. A program for allowing a computer to execute automatic computationfor inferring a plant organ in which a gene is to be expressed, theautomatic computation comprising the steps of:

(1) providing a transposable element sequence as a query sequence;

(2) providing the sequence of the gene;

(3) comparing the query sequence with the sequence of the gene; and

(4) outputting a result of the comparison.

56. A system for inferring a plant organ in which a gene is to beexpressed, comprising:

(A) a computer; and

(B) a program for allowing the computer to execute automatic computationfor inferring the plant organ in which the gene is to be expressed, theautomatic computation comprising the steps of:

(1) providing a transposable element sequence as a query sequence;

(2) providing the sequence of the gene;

(3) comparing the query sequence with the sequence of the gene; and

(4) outputting a result of the comparison.

57. A system according to item 56, wherein the computer is linked to anetwork.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diagram showing Tourist-OsaCatA present in apromoter region in the rice catalase CatA gene.

FIG. 2 is a diagram showing a result of analysis (RT-PCR) for expressionof a known gene in the vicinity of Tourist-OsaCatA. In the figure, phy18indicates phytochrome 18 gene (Kay et al., 1989); A1 gene indicates aputative NADPH-dependent reductase A1 gene (Chen and Bennetzen, 1996);XA21E gene indicates Xa21 family member E (Song et al., 1997); sbe1indicates 1,4-α-glucan branching enzyme gene (Kawasaki et al., 1993);OCII gene indicates Oryzacystatin II gene (Kondo et al., 1991); amy2Aindicates α amylase gene (Huang et al., 1992); HMGR gene indicates3-hydroxy-3-methylglutaryl coenzyme A reductase gene (Nelson et al.,1994); and CatA indicates catalase CatA gene (Higo and Higo, 1996).Cycles indicate the number of PCR cycles. L indicates a leaf, Rindicates a root, F indicates a flower, and S indicates an immatureseed.

FIG. 3 is a diagram showing a result of analysis (RT-PCR) for expressionof a gene corresponding to EST in the vicinity of Tourist-OsaCatA. Thesame symbols as those in FIG. 2 indicate the same elements.

FIG. 4 is a diagram showing a result of RT-PCR analysis for geneexpression in each organ of rice of a putative protein coding region(CDS) in the vicinity of Tourist-OsaCatA in BAC/PAC clones. The samesymbols as those in FIG. 2 indicate the same elements.

FIG. 5 is a diagram showing a result of Southern analysis for entire DNAof rice (variety: Nipponbare). In the Southern analysis, DNA wasdigested with HindIII and XhoI (lane 1), EcoRV and HindIII (lane 2),BamHI (lane 3), or EcoRI (lane 4), followed by electrophoresis. Afterhybridization with a probe, the filter was washed with low stringency(A), and then with high stringency (B). 2 kb and 4 kb fragmentscontained Tourist-OsaCatA, indicated with an asterisk and the point ofan arrow, respectively. The size (kbp) of marker DNA (λDNA/HindIII) isindicated to the left.

FIG. 6 is a diagram showing results of RT-PCR analysis for geneexpression at each site of rice glumose flower of a known gene in thevicinity of Tourist-OsaCatA and a gene corresponding to EST(A), andCDS(B), where lane 1: stamen, lane 2: pistil, lane 3: lemma•palea, lane4: glumose flower base (rachilla, glume, rudimentary glume, lodicule).

FIG. 7 is a diagram showing comparison between a base sequence (115 bp)at a middle portion of Tourist-OsaCatA (OsaCatA), and a correspondingregion in a Tourist-OsaCatA like sequence in the vicinity of a putativeprotein coding region (CDS) whose expression was detected. Dots (•)indicate homology to the base sequence of Tourist-OsaCatA. Portionshaving no corresponding bases are indicated by gaps (−).

FIG. 8 is a diagram showing an exemplary computer system for carryingout the present invention.

DETAILED DESCRIPTION OF THE INVENTION

(Simple Explanation of each Sequence)

SEQ ID NO: 1 indicates a representative key sequence according to thepresent invention.

SEQ ID NO: 2 indicates a preferable key sequence according to thepresent invention.

SEQ ID NOs: 3 to 14 indicate sequences which have been known as having abase sequence similar to Tourist.

SEQ ID NO: 3 indicates positions 4621 to 8640 of Accession No. X14172(phy18) (amino acid coding regions (4626 . . . 6690, 6913 . . . 7729,8011 . . . 8307, 8410 . . . 8617)).

SEQ ID NO: 4 indicates positions 26881 to 28560 of Accession No. U70541(A1 gene) (amino acid coding regions (26910 . . . 27030, 27143 . . .27507, 27894 . . . 28526)).

SEQ ID NO: 5 indicates positions 2761 to 5280 of Accession No. U72724(XA21E gene) (amino acid coding regions (2819 . . . 5260)).

SEQ ID NO: 6 indicates positions 3301 to 10620 of Accession No. D10838(sbe1) (amino acid coding regions (3360 . . . 3443, 3546 . . . 3608,5821 . . . 6028, 6144 . . . 6213, 6648 . . . 6917, 7026 . . . 7932, 8245. . . 8361, 8519 . . . 8581, 9019 . . . 9126, 9595 . . . 9696, 9862 . .. 9929, 10011 . . . 10091, 10210 . . . 10326, 10408 . . . 10612)).

SEQ ID NO: 7 indicates positions 421 to 1200 of Accession No. X57658(OC-II gene) (amino acid coding regions (446 . . . 574, 983 . . .1174)).

SEQ ID NO: 8 indicates positions 541 to 4080 of Accession No. M74177(amy2A) (amino acid coding regions (581 . . . 661, 743 . . . 875, 2379 .. . 3199, 3744 . . . 4040)).

SEQ ID NO: 9 indicates positions 1981 to 6480 of Accession No. L28995(HMGR gene) (amino acid coding regions (2018 . . . 2775, 4836 . . .5017, 5631 . . . 5977, 6202 . . . 6444)).

SEQ ID NO: 10 indicates positions 1561 to 3840 of Accession No. D29966(CatA) (amino acid coding regions (1591 . . . 1605, 1894 . . . 2694,2781 . . . 3380, 3730 . . . 3789)).

SEQ ID NO: 11 indicates positions 1081 to 5340 of Accession No. X89226(LRK2 gene) (amino acid coding regions (1126 . . . 3733, 4934 . . .5298)).

SEQ ID NO: 12 indicates positions 1381 to 2040 of Accession No. X52422(RAB16B gene) (amino acid coding regions (1396 . . . 1629, 1725 . . .1985)).

SEQ ID NO: 13 indicates positions 1081 to 4740 of Accession No. Z15085(GP28 gene) (amino acid coding regions (1094 . . . 1381, 2773 . . .4735)).

SEQ ID NO: 14 indicates positions 901 to 2187 of Accession No. U72255(GNS9 gene) (amino acid coding regions (956 . . . 1028, 1127 . . .2187)).

SEQ ID NOs: 15 to 25 are base sequences containing protein codingregions (CDS) of genes which were confirmed in the present invention tobe expressed in desired sites.

SEQ ID NO: 15 indicates positions 96961 to 98100 of Accession No.AB023482 (CDS3) (CDS regions (96980 . . . 97015, 97192 . . . 98055)).

SEQ ID NO: 16 indicates positions 55621 to 60600 of Accession No.AB026295 (CDS6) (CDS regions (55634 . . . 55706, 56057 . . . 56417,57951 . . . 58143, 58542 . . . 59093, 59182 . . . 59328, 60209 . . .60562)).

SEQ ID NO: 17 indicates positions 19141 to 22261 of Accession No.AJ243961 (CDS7) (CDS region (20178 . . . 21866)).

SEQ ID NO: 18 indicates positions 31201 to 33000 of Accession No.AJ243961 (CDS8) (CDS regions (complement (32825 . . . 32949), complement(30355 . . . 31213))).

SEQ ID NO: 19 indicates positions 13561 to 18600 of Accession No.AJ245900 (CDS9) (CDS regions (complement (18519 . . . 18594), complement(17735 . . . 17832), complement (17328 . . . 17361), complement (17012 .. . 17148), complement (16646 . . . 16712), complement (16324 . . .16423), complement (15519 . . . 15682), complement (14988 . . . 15034),complement (14833 . . . 14880), complement (14081 . . . 14594),complement (13572 . . . 13582))).

SEQ ID NO: 20 indicates positions 47761 to 55560 of Accession No.AJ245900 (CDS10) (CDS regions (complement (55452 . . . 55548),complement (54532 . . . 55083), complement (54172 . . . 54276),complement (53484 . . . 53745), complement (51359 . . . 51407),complement (51193 . . . 51277), complement (50866 . . . 50958),complement (50465 . . . 50731), complement (48371 . . . 48894),complement (47810 . . . 48283))).

SEQ ID NO: 21 indicates positions 92341 to 97980 of Accession-No.AP000361 (CDS12) (CDS regions (complements (92382 . . . 92477, 92598 . .. 92649, 92771 . . . 92844, 92951 . . . 93001, 93081 . . . 93188, 93449. . . 93550, 93734 . . . 93820, 94559 . . . 94601, 94689 . . . 94817,94917 . . . 94994, 95080 . . . 95129, 95344 . . . 95520, 95872 . . .95997, 96271 . . . 96384, 96876 . . . 96941, 97031 . . . 97096, 97723 .. . 97764, 97908 . . . 97928))).

SEQ ID NO: 22 indicates positions 7921 to 14160 of Accession No.AP000559 (CDS16) (CDS regions (7961 . . . 8199, 8666 . . . 8737, 8962 .. . 9033, 9134 . . . 9205, 9487 . . . 9558, 9770 . . . 9841, 9939 . . .10010, 10098 . . . 10169, 10254 . . . 10322, 10440 . . . 10511, 10637 .. . 10708, 10792 . . . 10863, 10948 . . . 11019, 11102 . . . 11173,11262 . . . 11333, 11448 . . . 11519, 11611 . . . 11682, 11795 . . .11866, 11963 . . . 12034, 12124 . . . 12195, 12272 . . . 12353, 12398 .. . 12515, 12601 . . . 12732, 12838 . . . 13176, 13259 . . . 13629,13761 . . . 14114)).

SEQ ID NO: 23 indicates positions 76801 to 78960 of Accession No.AP000559 (CDS17) (CDS region (complement (76828 . . . 78936))).

SEQ ID NO: 24 indicates positions 49981 to 53460 of Accession No.AP000570 (CDS23) (CDS region (50022 . . . 50087, 50181 . . . 50281,50401 . . . 50558, 50707 . . . 50781, 51681 . . . 51820, 52437 . . .52530, 53216 . . . 53424)).

SEQ ID NO: 25 indicates positions 95341 to 98220 of Accession No.AP000836 (CDS26) (CDS regions (complements (95361 . . . 95398, 95488 . .. 95556, 95925 . . . 96026, 97898 . . . 98003, 98148 . . . 98168))).

SEQ ID NOs: 26 to 37 indicates the sequences of corresponding regions inthe Tourist-OsaCatA like sequence in the vicinity of a putative proteincoding region (CDS) whose expression was detected at a desired site. SEQID NO: 26 (Osa#3) (homology to-OsaCatA: 82.6%); SEQ ID NO: 27 (Osa#6)(homology to-OsaCatA: 90.4%); SEQ ID NO: 28 (Osa#7) (homologyto-OsaCatA: 73.5%); SEQ ID NO: 29 (Osa#8) (homology to-OsaCatA: 84.3%);SEQ ID NO: 30 (Osa#9) (homology to-OsaCatA: 65.8%); SEQ ID NO: 31(Osa#10) (homology to-OsaCatA: 86.2%); SEQ ID NO: 32 (Osa#12) (homologyto-OsaCatA: 77.4%); SEQ ID NO: 33 (Osa#16) (homology to-OsaCatA: 85.2%);SEQ ID NO: 34 (Osa#17) (homology to-OsaCatA: 81.9%); SEQ ID NO: 35(Osa#18) (homology to-OsaCatA: 87.8%); SEQ ID NO: 36 (Osa#24) (homologyto-OsaCatA: 89.6%); SEQ ID NO: 37 (Osa#29) (homology to-OsaCatA: 90.4%).

Hereinafter, the present invention will be described.

It should be understood throughout the present specification thatarticles for singular forms (e.g., “a”, “an”, “the”, etc. in English;“ein”, “der”, “das”, “die”, etc. and their inflections in German; “un”,“une”, “le”, “la”, etc. in French; and articles, adjectives, etc. inother languages) include the concept of their plurality unless otherwise mentioned. It should be also understood that terms as used hereinhave definitions ordinarily. used in the art unless otherwise mentioned.

(Best Mode for Carrying out the Invention)

As used herein, “transposon” or “transposon sequence” refers to a DNAsequence having a predetermined structure which can undergotransposition on a chromosomal DNA. Transposons are ubiquitous inbacteria, yeast, maize, Drosophila, and the like. Transposition sitesare not constant. Transposons are transferred to any genes. When atransposon is inserted in the vicinity of a gene, the transposon mayhave an influence on expression of the gene. When a transposon isinserted within a gene, the gene may be inactivated.

As used herein, “MITE (miniature inverted-repeat transposable element)”or MITE sequence refers to a transposable element having a small size(typically, 0.5 kb or less) which are scattered on a chromosomal DNA andhas a terminal inverted repeat.

To date, genetic elements (transposable elements), which aretransposable on the same DNA in a chromosome or between DNAs indifferent chromosomes, have been found in various organisms (Finnegan,1989; Flavell et al., 1994; Bennetzen, 2000). The transposable elementsare classified into two classes according to their transpositionmechanism. Class I elements undergo transposition through reversetranscription of RNA transcription intermediates. Class II elementsundergo transposition directly from DNA to DNA (Finnegan, 1989). Atransposable element of a novel class called MITE has been reported in aplant for the first time (Zhang et al., 2000 and references citedtherein). MITE has a structure typical to DNA transposable elements, butdoes not code a transposase essential for transposition (Bennetzen,2000). MITE tends to be present in the vicinity of a gene (Mao et al.,2000 and references cited therein), and is a transposable element whichwas most frequently found in the base sequence of a rice genome of 910kb (Turcotte et al., 2001).

As used herein, “Tourist” sequence or “Tourists” element, which areinterchangeably used, refers to a base sequence (element) whichundergoes transposition on a chromosomal DNA, or which is considered tobe produced by transposition. A Tourist sequence is a type of MITEsequence, and was originally identified in the Waxy (wx) gene of maize(Zea mays). The Tourist sequences are characterized by terminal invertedrepeat, small size, the tendency of the base sequence of an insertionsite, and stable DNA secondary structure (Bureau and Wessler, 1992). TheTourist sequences are classified into four subfamilies (Tourist-A, B, Cand D) according to their internal base sequence (Bureau and Wessler,1994). Tourist-A element is found in maize, Tourist-B in sorghum(Sorghum bicolor), Tourist-C in rice (Oryza sativa) and sorghum, andTourist-D in maize and barley (Hordeum vulgare).

The Tourist-A is characterized by having a repeated sequence consistingof GGATT. Tourist-B is characterized by having box I, domain I, and asubterminal polyA/polyT region. Tourist-C is characterized by having boxI, domain I and I′, and a subterminal polyA/polyT region. Tourist-D ischaracterized in that no conserved region is contained in the internalbase sequence.

If Tourist was inserted in the vicinity of a gene, which is to beexpressed in a flower, before speciation into cereal, such as rice,maize, and sorghum, there is a possibility that a Tourist-like sequenceis found in the vicinity of a gene, which is to be expressed at aspecific site (e.g., a flower), in the case of maize, sorghum, and thelike other than rice. It can be easily understood by those skilled inthe art that the base sequence of a portion of Tourist used fordetection varies depending on the purpose. The base sequence of anotherportion of Tourist may have been suitable for detection in terms ofother specific purposes since the speciation of Tourist into types A, B,C and D.

Alternatively, it can be contemplated that Tourist sequence has beengradually mutated into Tourist-A, B, C, or D with the speciation intorice, maize, sorghum, and the like, and thereafter, Tourist-A, B, C, orD separately underwent transposition in the respective plant variety tobe inserted in the vicinity of a gene. Therefore, Tourist-A, B or D mayalso be utilized for detection of genes which are to be expressed in aspecific organ. Specifically, since a long time has passed since thespeciation of cereal, Tourist-A, B, C or D can be distinguished fromeach other. By using a certain portion of these base sequences, geneswhich are to be expressed in a specific organ can be detected.

In the present invention, preferably, the Tourist-C sequence (alsoreferred to as the Tourist-C element) is particularly used. Thetourist-C sequence is characterized by having box I, domains I and I',and subterminal polyA/polyT regions. Domain I' has a sequence similar tothat of the complementary strand of domain I (Bureau and Wessler, 1992).It is known that the Tourist elements are found in introns of genes orregions in the vicinity of genes (Bureau and Wessler, 1992, 1994; Bureauet al., 1996). It is therefore believed that the Tourist elements areinserted dominantly in the vicinity of protein coding regions in agenome.

More preferably, the Tourist-C sequence of the present invention may behere in the Tourist-OsaCatA sequence. The Tourist-OsaCatA is a Touristelement present in the 5′-upstream region of the CatA gene (Iwamoto etal., 1999), which has been found by comparing the base sequence of oneof the rice catalase genes, the CatA gene (Higo and Higo, 1996) withvarious types of rice of the genus Oryza. Therefore, in one embodiment,Tourist-OsaCatA may be used for the purpose of finding “conserved basesequences” suitable for phylogenetic analysis among a number of types ofrice of the genus Oryza.

The inventors of the present invention fused a 5′-upstream promoterregion of the rice catalase CatA gene (Higo and Higo, Plant Mol. Biol.30: 505-521 (1996)) with a reporter gene (GUS), and introduced theresultant gene into rice. Tissue of the transformed plant was stained.As a result, it was found that the reporter gene was strongly expressedin the anther and pollen. Thus, the inventors revealed that the promotercan be utilized so as to prepare a promoter cassette for expression of auseful gene in an anther or pollen for the aim to produce male sterilitystrains or the like (International Publication WO 00/58454 published onOct. 5, 2000). In the CatA promoter region, there is a base sequence(about 300 bases) which is a transposable element and belongs to a groupcalled Tourist sequence (present at positions 164 to 515 of the basesequence of the above-described patent application; designated asTourist-OsaCatA) (Iwamoto et al. Mol. Gen. Genet. 262: 493-500 (1999))(FIG. 1).

However, to the present inventors' knowledge, prior to the disclosure ofthe present invention there has been no report indicating that atransposon sequence (e.g., a MITE sequence, such as Tourist sequence) issuccessfully used as an indicator to detect a gene which is to beexpressed in a desired organ.

In a seed plant, such as rice, even if a transposon, such as Tourist,undergoes transposition into a chromosome, the transposon is not passedto progeny unless the chromosome is of a germ cell. Therefore, it isdifficult to imagine that a transposon can be used as an indicator todetect a gene which is to be expressed in a desired organ.

The present inventors discovered that a transposon, such as Tourist,which is present in a cell at a site containing a flower, can be passedto progeny and expressed in a specific organ.

Therefore, since it was not believed based on the findings as of thedisclosure of the present invention regarding a transposon sequence(e.g., a MITE sequence, such as a Tourist sequence) that the transposonsequence is related to an expression specific site, the presentinvention has an advantageous effect, which is not otherwisepredictable, over conventional findings.

According to the present invention, a transposon sequence (e.g., A MITEsequence, such as a Tourist sequence) can be utilized in organs otherthan a genital organ in a plant. This is because an inserted transposonmay be passed to progeny through a non-genital organ in the case ofplants which undergo vegetative reproduction (e.g., plants reproducedthrough underground stems: iris (rhizome), arrowhead (corm), saffron(corm), potato (tuber), dahlia (tuber), lily (bulb), onion (bulb),bracken (underground stem), lotus (underground stem); roots: sweetpotato (tuberous root); bulbil: yam (bulbil), garlic (bulbil); branches:strawberry (runner)). Therefore, it will be clearly understood by thoseskilled in the art that even in the case of transposition in a root, astem, and a leaf other than a flower, a gene whose expression isspecific to an organ, such as a root, a stem, and a leaf, can bedetected using a transposon sequence as a key.

As used herein, “key” sequence refers to a sequence which is used in agene search in accordance with the present invention. The key sequencemay be electronic data for use in a computer, i.e., a “query” sequence,or a biological probe for in vitro and/or in vivo screening, i.e., a“probe” sequence. As used herein, “query” sequence refers to a sequencefor use in gene search by a computer, including the base sequence of DNAor RNA or the amino acid sequence of a protein for which a database issearched. “Probe sequence” refers to a sequence for use in a biologicalexperiment, such as in vitro and/or in vivo screening.

As used herein, “gene population” includes, but is not limited to, anonredundant population of gene data items, which are mutually related,in the form of electronic data, i.e., a “database”, and a biologicalpopulation, i.e., a “library”. A “database” refers to a nonredundantgroup of gene data items which are mutually related, includingelectronic data. The database may be a DNA database or a proteindatabase. Preferably, the database may be a DNA database or the like. Asused herein, “library” refers to a group of genes for use in biologicalscreening, which are mutually related. The library may be a DNA libraryor an RNA library, or a protein library. A DNA library is preferable.

As used herein, “search” refers to utilizing a certain nucleic acid basesequence in an electronic or biological manner or the like to findanother nucleic acid base sequence. Examples of the electronic searchincludes, but are not limited to, BLAST (Altschul et al., J. Mol. Biol.215: 403-410 (1990)), FASTA (Pearson & Lipman, Proc. Natl. Acad. Sci.,USA 85: 2444-2448 (1988)), Smith and Waterman method (Smith andWaterman, J. Mol. Biol. 147: 195-197 (1981)), and Needleman and Wunschmethod (Needleman and Wunsch, J. Mol. Biol. 48: 443-453 (1970)), and thelike. Examples of the biological search include stringent hybridization,macroassay in which genome DNA is attached to nylon membrane or thelike, microassay in which genome DNA is attached to a glass plate(microarray assay), PCR and in situ hybridization.

As used herein, “comparative selection” of sequences refers to comparingtwo sequences in an electronic or biological manner or the like withrespect to a certain nucleic acid base sequence (e.g., in the case ofelectronic comparative selection, two sequences are aligned so as todetermine a difference every unit sequence so that a desired sequence isselected. Methods similar to those used in the above-described searchmay be applied to the comparative selection of sequences. In anelectronic search for a sequence, a query sequence is compared with alarge number of base sequences in a database so as to find the closestsequence. On the other hand, in a biological experiment (hybridizationor the like), a sequence having the highest complementarity stronglybinds to a probe. The binding corresponds to comparative selection. Thecomparative selection is also herein simply referred to as “selection”.

As used herein, “stringent conditions” for hybridization refer toconditions under which the complementary strand of a nucleotide strandhaving homology to a target sequence predominantly hybridizes with thetarget sequence, and the complementary strand of a nucleotide strandhaving no homology substantially does not hybridize. “Complementarystrand” of a certain nucleic acid sequence refers to a nucleic acidsequence paired with the certain nucleic acid sequence by hydrogen bondsbetween nucleic acid bases (e.g., T for A and C for G). The stringentconditions are sequence-dependent, and vary depending on variouscircumstances. The longer the sequence, the higher temperature thesequence specifically hybridizes at. In general, as for the stringentconditions, the temperature is selected about 5° C. lower than themelting point (Tm) of a particular sequence at a predetermined ionicstrength and pH. Tm is the temperature at which 50% of nucleotidescomplementary to a target sequence hybridize to the target sequence inan equilibrium state under a predetermined ionic strength, pH, andnucleic acid concentration. “Stringent conditions” aresequence-dependent and vary depending on various environmentalparameters. The stringent conditions are described in detail in Tijssen(1993), Laboratory Technniques In Biochemistry And MolecularBiology-Hybridization With Nucleic Acid Probes Part I, Second Chapter“Overview of principles of hybridization and the strategy of nucleicacid probe assay”, Elsevier, N.Y.

Microarray assay is a technology well known to those skilled in the art,and is described in detail in DeRisi et al., Science 278: 680-686(1997); Chu et al., Science 282: 699-705 (1998).

As used herein, “gene” refers to a functional unit of heredity, whichtypically occupies a specific site (locus) on a chromosome. In general,a gene can reproduce itself with accuracy in cell division, and controlsynthesis of protein, such as an enzyme. A gene as a functional unit ismade of discontinuous segments of a DNA macromolecule. The DNA moleculecontains a proper sequence of bases (A, T, G and C) coding a specificpeptide (amino acid sequence). Genetic information is typicallydescribed by DNA and sometimes RNA. As described above, a gene istypically present within a chromosome, and all chromosomes are arrangedin a pair, except for a human male sex chromosome (X and Y), forexample. Genes are typically present in a pair in any cell other than agamete. A gene typically contains a region coding a protein (exons) and,in addition, introns present between exons, an expression control region(promoter region) upstream of a first exon, and a region downstream ofthe protein coding region.

As used herein, “structural gene” refers to a portion of a gene otherthan a promoter and an intron of the gene, which determines directly theprimary structure of a polypeptide in accordance with a genetic code.

As used herein, “homology” or “similarity” of a gene or a sequence,which are interchangeably used, refers to the magnitude of identitybetween two or more gene sequences. The magnitude of homology is hereindetermined by Blast using its default parameters. Therefore, the greaterthe homology between two genes, the greater the identity or similaritybetween their sequences. Whether or not two genes have homology isdetermined by comparing their sequences directly or by a hybridizationmethod under stringent conditions. When two gene sequences are directlycompared with each other, the genes have homology if representatively atleast 50%, preferably at least 70%, more preferably at least 80%, 90%,95%, 96%, 97%, 98%, or 99% of the DNA sequence of the genes areidentical.

As used herein, “site containing a flower” refers to any site of a plantcontaining a flower. Therefore, the site containing a flower may containa plant organ other than a flower (e.g., lemma, palea, glume,rudimentary glume, and rachilla).

As used herein, “flower” is a reproductive structure characteristic to aseed plant, and has a vivid color in its part or entirety. For example,the flower of rice consists of glumose flowers, a stamen, a pistil and abase.

As used herein, “stamen” refers to a male genital organ consisting of ananther and a filament, and “pistil” refers to a female genital organconsisting of a stigma, a style, and an ovary.

As used herein, “express at a site containing a flower” indicates thatexpression is performed only at a flower and, in addition, thatexpression is performed at a flower and other organs (tissue).

According to another aspect of the present invention, a method isprovided for obtaining information about a certain gene regarding thepresence or absence of a base sequence similar to a transposable elementsequence so as to infer that the gene may be expressed in a specificplant organ or site associated with the transposable element sequence.

In the method of the present invention, by utilizing the base sequenceof a transposable element (e.g., a certain transposable element iscommonly present in the vicinity of several genes which are expressed ina root) which is specific to an organ (e.g., organs containing a blade,a leaf sheath and a root, and the like) other than a flower of a plant(e.g., monocotyledon (e.g., rice)), vast base sequence data about thegenome of rice can be screened for a gene which is to be expressed in aroot.

Conventionally, a base sequence motif in a promoter sequence of a gene,to which a transcription regulatory element binds, has been studied soas to find a clue for such a search. Such a short motif brings noise toa search. Therefore, the use of such a motif is practically impossibleor very difficult. “Noise” in a search refers to a motif sequence whichhas the same base sequence as that of a known short motif, but does notactually function, since the motif sequence is not located at anappropriate position with respect to a transcription initiation point,or the like. In currently available motif databases and analysis tools,a search is performed simply with reference to the presence or absenceof matching of short base sequences. The noise has not been reduced to anegligible level. Therefore, when a short sequence which is the same asa known motif is found on a certain DNA base sequence, the shortsequence is suggested to be potentially a functioning motif, but at thesame time is highly likely to be mere noise, leading to poor searchefficiency. The present invention is provided so as to solve thisproblem. It is not until a transposable element having a size of about100 to 300 bases is employed in the present invention that the noise canbe reduced so that a gene, which is specifically expressed at a specificsite, can be searched for.

The present invention also provides genes found by the above-describedmethod of the present invention. Therefore, these genes are expressed ata site containing a flower.

The present invention also provides a method for modifying transcriptionof a plant using information about the genes found in theabove-described method of the present invention. The modification maybeeasily carried out using molecular biological and/or biochemicaltechnologies commonly used in the art.

FIG. 8 shows an exemplary configuration of a computer 8 according to thepresent invention for carrying out the method of the present invention.The computer 8 comprises a CPU 81, a main memory 82, a hard disk drive(HDD) 83, and an input interface 84. These components 81 to 84 are, forexample, interconnected through a bus 86. Any type of memory may be usedinstead of the HDD 83.

The HDD 83 stores a program representing automatic computation(hereinafter referred to as automatic computation program) in advance.Alternatively, an automatic computation program may be recorded onto anytype of computer readable recording medium, such as a floppy disk, aCD-ROM, a CD-R, and a DVD-ROM. The automatic computation program storedin such a recording medium is loaded into the HDD 83 via an inputapparatus (e.g., a disk drive).

The CPU 81 executes the automatic computation program stored in the HDD83. The execution of the automatic computation program by the CPU 81allows the computer 8 to function as an automatic computation apparatusaccording to the present invention.

An input device (e.g., a keyboard and a mouse) and the like may beconnected to the input interface 84. The input device may be used so asto input data required by the computer 8.

A portion of the automatic computation program or a portion of data isoptionally transferred to the main memory 82. The CPU 81 can access themain memory 82 at high speed.

A query sequence and/or a database for use in the present invention maybe input into the computer 8 using, for example, an input device, andstored in the HDD 83 as a master file. The query sequence and/or thedatabase may also be input via a network (e.g., the Internet) to thecomputer 8. A result of a search may be optionally output via an outputinterface (not shown). A display, a storage apparatus, or the like maybeconnected to the output interface, for example. The search result may bestored in the HDD 83. The search result may be optionally recorded ontothe above-described computer readable recording medium.

According to one aspect of the present invention, a method for detectinga gene at a desired site in a plant, comprises the step of:

(1) searching a gene population using a transposon sequence as a keysequence.

The method further comprises the step of:

(2) selecting a gene having similarity to the transposon sequence in thevicinity of a putative protein coding region.

The transposon sequence may be a MITE sequence. Preferably, thetransposon sequence maybe a Tourist sequence. Examples of the Touristsequence include Tourist-A, B, C or D. Preferably, the transposonsequence is the Tourist-C sequence. More preferably, the Touristsequence may be Tourist-OsaCatA (SEQ ID NO: 1). Even more preferably,the Tourist sequence may be a 115 bp sequence (SEQ ID NO: 2) ofpositions 109 to 223 of Tourist-OsaCatA.

The transposon sequence of the present invention (e.g., a MITE sequence,such as a Tourist sequence) may contain at least contiguous nucleotidesequence of about 10, about 15, about 20, about 25, about 30, about 35,about 40, about 45, about 50, about 75, about 100, about 115, about 125,about 150, about 200, about 250, or about 300 in a sequence indicated bySEQ ID NO: 1 or 2. In one embodiment, the above-described transposonsequence (e.g., a MITE sequence, such as a Tourist sequence) may containa sequence having at least about 70%, about 80%, about 90%, about 95%,or about 99% homology to the sequence indicated by SEQ ID NO: 1 or 2. Inanother embodiment, the above-described Tourist sequence may have atleast one substitution, addition or deletion in the sequence indicatedby SEQ ID NO: 1 as long as the function of the present invention can bemaintained. In another embodiment, the above-described Tourist sequencemay have at least one substitution, addition or deletion in the sequenceindicated by SEQ ID NO: 2 as long as the function of the presentinvention can be maintained. The above-described transposon sequence mayhave one or several substitutions, additions or deletions. Morepreferably, the above-described transposon sequence may be substantiallyor fully the same as the sequence indicated by SEQ ID NO: 1. In anotherpreferred embodiment, the above-described transposon sequence may besubstantially or fully the same as the sequence indicated by SEQ ID NO:2.

In another embodiment of the present invention, the above-describeddesired site may be a site containing a flower. The site containing aflower may be a flower. Preferably, the site containing a flower maycontain at least one site selected from a stamen and a pistil. Inanother embodiment, examples of the site containing a flower includelemma, palea, glume, rudimentary glume, rachilla, and lodicule. In stillanother embodiment, the plant may be monocotyledon. Preferably, theplant may be rice.

In another embodiment, the above-described database may be a DNAdatabase. More particularly, examples of the database include BBDJ,EMBL, and GenBank. When a biological technique is used, theabove-described database may be a DNA library.

In one embodiment of the present invention, examples of the searchmethod for use in search include BLAST, FASTA, Smith and Watermanmethod, and Needleman and Wunsch method. In another embodiment of thepresent invention, examples of the search method for use in searchinclude stringent hybridization, microarray assay, PCR, and in situhybridization.

In one embodiment of the present invention, the above-described vicinityof a putative protein coding region may be within about 2 kbp upstreamof the translation initiation codon, within about 1.1 kbp downstream ofthe translation termination codon, and within the intron.

In one embodiment of the present invention, the above-describedsimilarity is at least about 66% homology. The similarity may be atleast about 70%, at least about 75%, at least about 80%, at least about85%, at least about 90%, at least about 95%, at least about 99% homologyor the like.

In another aspect of the present invention, the present inventionprovides a composition for detecting a gene which is to be expressed ata site containing a flower, comprising a plasmid containing at leastabout 10 contiguous nucleotides in the sequence indicated by SEQ IDNO: 1. The nucleotides contained in the plasmid may be the transposonsequence of the present invention (e.g., a MITE sequence, such as aTourist sequence), which may contain at least contiguous nucleotidesequence of about 15, about 20, about 25, about 30, about 35, about 40,about 45, about 50, about 75, about 100, about 115, about 125, about150, about 200, about 250, or about 300 in the sequence indicated by SEQID NO: 1 or 2. In one embodiment, the above-described MITE sequence ortransposon sequence may contain a sequence having at least about 70%,about 80%, about 90%, about 95%, or about 99% homology to the sequenceindicated by SEQ ID NO: 1 or 2. In another embodiment, theabove-described transposon sequence may have at least one substitution,addition or deletion in the sequence indicated by SEQ ID NO: 1 as longas the function of the present invention can be maintained. In anotherembodiment, the above-described transposon sequence may have at leastone substitution, addition or deletion in the sequence indicated by SEQID NO: 2 as long as the function of the present invention can bemaintained. The above-described transposon sequence may have one orseveral substitutions, additions or deletions. More preferably, theabove-described transposon sequence may be substantially or fully thesame as the sequence indicated by SEQ ID NO: 1. In another preferredembodiment, the above-described transposon sequence may be substantiallyor fully the same as the sequence indicated by SEQ ID NO: 2.

In another aspect of the present invention, the present inventionprovides a kit for detecting a gene which is to be expressed at adesired site in a plant. The kit comprises:

(1) a plasmid containing at least about 10 contiguous nucleotides in thesequence indicated by SEQ ID NO: 1; and

(2) a DNA library.

In the method, the plasmid and the DNA library are defined as above.

In another aspect, the present invention provides a method for producinga gene which is to be expressed at a desired site in a plant. The methodcomprises the steps of:

(1) searching a gene population (e.g., a database) for a transposonsequence (e.g., a MITE sequence, such as a Tourist sequence) as a keysequence;

(2) selecting a gene having similarity to the above-described transposonsequence in a putative promoter region; and

(3) producing a nucleic acid molecule coding the above-described gene.

In this method, steps (1) and (2) are the same as those described above.A method for producing a nucleic acid molecule coding a gene is wellknown in the art as described, for example, in Sambrook et al.,Molecular Cloning: A Laboratory Manual (2nd Ed.), Vol. 1 to 3, ColdSpring Harbor Laboratory, (1989) or Current Protocols in MolecularBiology, F. Ausubel et al Ed., Greene Publishing and Wiley-Interscience,New York (1987). Preferably, the above-described production may becarried out in vitro or in vivo.

In another aspect., the present invention provides a recording mediumstoring a program for allowing a computer to execute automaticcomputation for detecting a gene which is to be expressed at a desiredsite in a plant. The automatic computation comprises the steps of:

(1) providing a transposon sequence (e.g., a MITE sequence, such as aTourist sequence) as a query sequence;

(2) providing a database;

(3) searching the database using the query sequence; and

(4) outputting a result of the search.

The description about the search in steps (1) to (4) is the same as thatwhich is described above. Techniques relating to the automaticcomputation are well known in the art and herein described above.

In another aspect, the present invention provides a program for allowinga computer to execute automatic computation for detecting a gene whichis to be expressed at a desired site in a plant. The automaticcomputation comprises the steps of:

(1) providing a transposon sequence (e.g., a MITE sequence, such as aTourist sequence) as a query sequence;

(2) providing a database;

(3) searching the database using the query sequence; and

(4) outputting a result of the search.

The description about the search in steps (1) to (4) is the same as thatwhich is described above. Techniques relating to the automaticcomputation are well known in the art and herein described above.

As used herein, information processing refers to calculation or processof information according to the purpose of use, and software refers to aprogram relating to operations of a computer. A program refers to anordinal sequence of instructions suitable for processing by a computer.The program is a product. A program list refers to presentation per seof a program by printing the program onto paper, displaying the programon a screen, or the like. A computer readable recording medium storing aprogram refers to a computer readable recording medium storing a programfor use in installing, executing, distributing, and the like, theprogram.

Procedure refers to a series of processes or operations which are linkedin a time-series manner in order to achieve a predetermined purpose.

Data structure refers to a logical structure of data represented byrelationships among elements. Hardware resource refers to a physicalapparatus or a physical element which is used for embodying a process,an operation or a function.

The hardware resource refers to, for example, as a physical apparatus, acomputer and its components (i.e., a CPU, a memory, an input apparatus,and an output apparatus), or a physical apparatus connected thereto.

In another aspect, the present invention provides a system for detectinga gene which is to be expressed at a desired site in a plant. The systemcomprises:

(A) a computer; and

(B) a program for allowing a computer to execute automatic computationfor detecting a gene which is to be expressed at a desired site in aplant.

The automatic computation comprising the steps of:

(1) providing a transposon sequence (e.g., a MITE sequence, such as aTourist sequence) as a query sequence;

(2) providing a database;

(3) searching the database using the query sequence; and

(4) outputting a result of the search.

The description about the search in steps (1) to (4) is the same as thatwhich is described above. Techniques relating to the automaticcomputation and the computer system are well known in the art and hereindescribed above.

In one embodiment, the above-described computer is linked to a network.The network may be preferably the Internet.

In another aspect, the present invention provides a method for inferringan organ of a plant in which a gene is to be expressed. The methodcomprises the step of:

(1) obtaining information about whether or not abase sequence similar tothe sequence of a transposable element is present in the vicinity of thegene, and when the similar sequence is present in the vicinity of thegene, inferring that the gene is to be expressed in a plant organrelating to the transposable element sequence.

As used herein, “transposable element sequence” refers to a basesequence which undergoes transposition on a chromosomal DNA or whichseems to be generated by transposition. Preferably, examples of thetransposable element sequence include Ac/Ds of maize and Tos17 of rice.

In one embodiment, the plant organ relating to the transposable elementsequence is a site containing a flower. In another embodiment, the sitecontaining a flower contains a site selected from the group consistingof a stamen and a pistil. In one embodiment, the sequence similar to thetransposable element sequence is a transposon sequence (e.g., a MITEsequence, such as a Tourist sequence), which may contain at leastcontiguous nucleotide sequence of about 15, about 20, about 25, about30, about 35, about 40, about 45, about 50, about 75, about 100, about115, about 125, about 150, about 200, about 250, or about 300 in thesequence indicated by SEQ ID NO: 1 or 2. In one embodiment, thetransposon sequence may contain a sequence having at least about 70%,about 80%, about 90%, about 95%, or about 99% homology to the sequenceindicated by SEQ ID NO: 1 or 2. In another embodiment, theabove-described transposon sequence may have at least one substitution,addition or deletion in the sequence indicated by SEQ ID NO: 1 as longas the function of the present invention can be maintained. In anotherembodiment, the above-described transposon sequence may have at leastone substitution, addition or deletion in the sequence indicated by SEQID NO: 2 as long as the function of the present invention can bemaintained. The above-described transposon sequence may have one orseveral substitutions, additions or deletions. More preferably, theabove-described transposon sequence may be substantially or fully thesame as the sequence indicated by SEQ ID NO: 1. In another preferredembodiment, the above-described transposon sequence may be substantiallyor fully the same as the sequence indicated by SEQ ID NO: 2.

In another embodiment, the plant includes rice.

In another embodiment, the present invention provides a nucleic acidmolecule coding a gene obtained by the method of the present invention.A method for producing this nucleic acid molecule is well known to thoseskilled in the art, and described in another portion of the presentspecification or Sambrook et al. (supra).

In one embodiment, the present invention provides a nucleic acidmolecule coding a gene which is to be expressed at a desired site in aplant. The base sequence of the nucleic acid molecule is obtained by amethod comprising the step of:

(1) searching a gene population using a transposon sequence (e.g., aMITE sequence, such as a Tourist sequence) as a key sequence.

In another aspect, the present invention also provides a recordingmedium storing a sequence coding a gene obtained by the method of thepresent invention. Examples of the recording medium include a flexibledisk, a hard disk, a CD-ROM, a CD-R, a CD-RW, a DVD-ROM, a tape, an MDor those which are described above.

In another embodiment, the present invention provides a method formodifying an expression pattern of a gene of a plant, comprising thestep of utilizing the sequence of a gene obtained by the method of thepresent invention.

In another aspect, the present invention provides a kit for inferring aplant organ in which a gene is to be expressed. The kit comprises:

(1) a molecule having a transposable element sequence.

The transposable element sequence is briefly described above. Themolecule may be a nucleic acid (DNA or RNA), or a derivative thereof.The derivative of the nucleic acid is well known in the art.

In still another embodiment, the present invention provides a kit forinferring a plant organ in which a gene is to be expressed. The kitcomprises:

(1) a recording medium storing a transposable element sequence.

The transposable element sequence is briefly described. The recordingmedium is described in detail in another portion of the presentspecification.

“Derivative” nucleotide refers to a nucleotide containing a derivativeof a nucleotide, or having a linkage with another nucleotide which isdifferent from a typical linkage. Specific examples of such a nucleotideinclude a derivative nucleotide with a phosphorothioate bond convertedfrom a phosphodiester bond in the original nucleotide, a derivativenucleotide with a N3′-P5′ phosphoroamidate bond converted from aphosphodiester bond in the original nucleotide, a derivative nucleotidewith peptide nucleic acids converted from a ribose and a phosphodiesterbond in the original nucleotide, a derivative nucleotide with a uracilin the original nucleotide substituted with a C-5 propynyl uracil, aderivative oligonucleotide with a uracil in the original nucleotidesubstituted with a C-5 thiazole uracil, a derivative nucleotide with acytosine in the original nucleotide substituted with a C-5 propynylcytosine, a derivative oligonucleotide with a cytosine in the originalnucleotide substituted with a phenoxazine-modified cytosine, aderivative nucleotide with a ribose in a DNA substituted with a2′-O-propyl ribose, and a derivative nucleotide with a ribose in theoriginal nucleotide substituted with a 2′-methoxyethoxy ribose.

In another embodiment, the present invention provides a recording mediumstoring a program for allowing a computer to execute automaticcomputation for inferring a plant organ in which a gene is to beexpressed. The automatic computation comprises the steps of:

(1) providing a transposable element sequence as a query sequence;

(2) providing a sequence of the gene;

(3) comparing the query sequence with the sequence of the gene; and

(4) outputting a result of the comparison.

The description about the search in steps (1) to (4) is the same as thatwhich is described above. Techniques relating to the automaticcomputation are well known in the art and herein described above.

In another embodiment, the present invention provides a program forallowing a computer to execute automatic computation for inferring aplant organ in which a gene is to be expressed. The automaticcomputation comprises the steps of:

(1) providing a transposable element sequence as a query sequence;

(2) providing a sequence of the gene;

(3) comparing the query sequence with the sequence of the gene; and

(4) outputting a result of the comparison.

Techniques relating to the automatic computation and the computer systemare well known in the art and herein described above.

In another embodiment, the present invention provides a system forinferring a plant organ in which a gene is to be expressed. The systemcomprises:

(A) a computer; and

(B) a program for allowing the computer to execute automatic computationfor inferring a plant organ in which a gene is to be expressed. Theautomatic computation comprises the steps of:

(1) providing a transposable element sequence as a query sequence;

(2) providing a sequence of the gene;

(3) comparing the query sequence with the sequence of the gene; and

(4) outputting a result of the comparison.

The description about the search in steps (1) to (4) is the same as thatwhich described above. Techniques relating to the automatic computationand the computer system are well known in the art and herein describedabove. Preferably, the computer is linked to a network.

Examples of the gene identified by the method of the present invention,which is to be expressed in a site containing a flower, are describedbelow.

Among genes which have been to date isolated from rice, there are 12genes, including CatA (Kay et al., Nucl. Acids Res. 17, 2865-2866(1989); Yamaguchi-Shinozaki et al., Plant Mol. Biol. 14: 29-39 (1989);Kondo et al., J. Biol. Chem. 265, 15832-15837 (1991); Huang et al.,Gene, 111, 223-228 (1992); Kawasaki et al., Mol. Gen. Genet. 237, 10-16(1993); Minami and Tanaka, Biochim. Biophys. Acta 1171: 321-322 (1993);Nelson et al., Plant Mol. Biol. 25, 401-412 (1994); Chen and Bennetzen,Plant Mol. Biol. 32, 999-1001 (1996); Higo and Higo, Plant Mol. Biol.30, 505-521 (1996); Song et al., Plant Cell, 9, 1279-1287 (1997);GenBank accession number U72255; GenBank accession number X89226), inthe vicinity of which base sequences similar to Tourist-OsaCatA werefound (Bureau et al. Proc. Natl. Acad. Sci. USA, 93: 8524-8529 (1996);Iwamoto et al., Mol. Gen. Genet. 262: 493-500 (1999)). These sequencesare respectively indicated by SEQ ID NOs: 3 to 14. The present inventionis the first to find that these genes, except for CatA and HMGR, areexpressed in a site containing a flower. Therefore, the presentinvention also provides nucleic acid molecules containing thesesequences, which are expressed in a site containing a flower.

Expression of these genes in a flower, a root, a leaf, and an immatureseed was analyzed with RT-PCR. As a result, expression of 8 genes wasconfirmed, all of which were expressed in the flower (FIG. 2). Theinventors' research was the first to find expression in a flower ofthese genes other than CatA and HMGR. Therefore, it was demonstratedthat the usefulness of the present invention was difficult to predictbased on the conventional state of the art.

The inventors carried out a further search to find three similarsequences to Tourist-OsaCatA among the DNA base sequences which had beenregistered as EST (Iwamoto et al., Mol. Gen. Genet. 262: 493-500(1999)). RT-PCR analysis confirmed that two of them were expressed in aflower (FIG. 3).

Further, a similarity search (BLAST) was carried out in a DNA databaseusing the Tourist-OsaCatA base sequence as a query. As a result, 32highly similar sequences were detected in the vicinity of 30 regions(CDS) which were inferred to code a gene in a BAC/PAC clone (Table 1).Of the 30 CDSs, 29 CDSs had the same size a PCR product using a genomicDNA as that which was expected. The 29 CDSs were investigated withRT-PCR for the presence or absence of expression of each gene in ablade, a root, a flower, and an immature gene (Table 2). As a result,for 11 CDSs, a product having the same size as that which was expectedwas observed. The expression of all CDSs were confirmed only in aflower, or in a flower and other organs (FIG. 4).

TABLE 1 Tourist-OsaCatA like base sequences detected in BAC/PAC cloneSize Accession Tourist (bp) No. Location^(a) Insertion site^(b) Osa#1333 AB023482 22909 . . . 23241 5′F(873) Osa#2 325 AB023482 83129 . . .83453 5′F(2308) Osa#3 345 AB023482 95093 . . . 95440 5′F(1540) Osa#4 339AB023482 117761 . . . 117423C 5′F(840), 5′F(1516) Osa#5 342 AB023482145798 . . . 146139 3′F(445), 5′F(763) Osa#6 319 AB026295 59927 . . .59609C intron-5 Osa#7 342 AJ243961 18655 . . . 18314C 5′F(1523) Osa#8343 AJ243961 33544 . . . 33202C 5′F(229) Osa#9 327 AJ245900 19061 . . .19387 5′F(467) Osa#10 304 AJ245900 47364 . . . 47061C 3′F(446) Osa#11312 AJ245900 83914 . . . 84225 5′F(446) Osa#12 334 AP000367 91902 . . .91569C 3′F(480), 3′F(987) Osa#13 342 AP000391 62380 . . . 627213′F(1212) Osa#14 337 AP000399 139202 . . . 138865C intron-1 Osa#15 345AP000559 939 . . . 1283 intron-2 Osa#16 269 AP000559 6477 . . . 67455′F(1216) Osa#17 347 AP000559 75417 . . . 75763 3′F(1065) Osa#18 325AP000559 80756 . . . 80432C 5′F(1496) Osa#19 303 AP000570 8989 . . .9291 3′C, 3′C Osa#20 312 AP000570 9745 . . . 10056 intron-1 Osa#21 339AP000570 28879 . . . 28541C 3′F(468) Osa#22 343 AP000570 29304 . . .28962C 5′F(191) Osa#23 337 AP000570 37206 . . . 36870C 5′F(504) Osa#24342 AP000570 54719 . . . 54378C 5′F(954) Osa#25 324 AP000570 111910 . .. 111587C intron-1 Osa#26 343 AP000615 67875 . . . 67532C 5′F(1273)Osa#27 333 AP000615 68678 . . . 68346C 5′F(2087) Osa#28 318 AP00083629854 . . . 29537C 3′F(858) Osa#29 342 AP000836 99718 . . . 99377C5′F(1209) Osa#30 335 AP000836 155125 . . . 155459 5′F(1355) Osa#31 345AP000836 184537 . . . 184193C intron-4 Osa#32 362 AP000837 125297 . . .124936C 3′F(831) ^(a)Locations of Tourist-OsaCatA like sequences in abase sequence registered in DDBJ/EMBL/GenBank databases. C indicates acomplementary strand. ^(b)Tourist-OsaCatA like sequences were insertedin a 5′-upstream region (5′F), a 3′ downstream region (3′F), an intron,and a 3′ terminal region (3′C) of CDS. The number in each parenthesisindicates the length (bp) between a Tourist-OsaCatA like sequence and aCDS in the vicinity thereof.

Further, flower tissue of rice was subdivided and subjected to RNAextraction to analyze the presence or absence of expression of 11 CDSs(protein coding regions presumed to be a gene), two ESTs, and 8 geneswith RT-PCR. The result is the following.

Expression in stamen and/or pistil: 12 cases

Expression in stamen, pistil, lemma/palea, and base: 5 cases

Expression in pistil, and lemma/palea: 1 case

Expression in stamen, pistil, and base: 1 case

No RT-PCR product confirmed (no clear growth): 2 cases

Thus, it was demonstrated that the genes detected by the method of thepresent invention were “expressed in a site containing a flower”.Preferably, the genes obtained by the method of the present inventioncan be said to “have a relatively high probability (about 90% or more)of being expressed in the stamen and/or pistil of a flower”. The geneswhose products were not confirmed are considered to have insufficientRNA. Therefore, the inventors concluded that substantially all of thegenes are actually expressed at a site containing a flower. The presentinvention can achieve prediction of expression of a gene obtained byscreening at a relatively high probability (specifically, for example,at least about 90% or more, about 95% or more, and the like), which wasnot realized by means of conventional technology. The present inventionis the very first to achieve such an advantageous effect and usefulness.

As a result, it was demonstrated that the Tourist-OsaCatA base sequencescould be used to obtain a method for efficiently selecting genes, whichare expressed in a flower, from base sequences of the rice genome in aDNA database.

The method of the present invention, which provides information about anorgan (tissue) in which a gene is to be expressed, is of great benefitto the status quo in which although genome analysis can easily determinethe base sequence of a genome, the functions of most genes are notknown.

Further, a gene which Southern hybridization analysis of rice genomicDNA reveals has a portion of a putative promoter region, to which aprobe (Tourist-OsaCatA) binds under stringent conditions, is highlylikely to be expressed in a flower. Such a gene is used for screening arice genomic DNA library (FIGS. 5A and 5B).

An experiment using a partially deleted promoter revealed that theTourist-OsaCatA fragment per se is not essential for expression in aflower. It is considered that somewhere in the course of evolution, theDNA structure of a gene which was actively expressed in a flower wasrelaxed and transposase required for insertion of a transposon was alsosynthesized and as a result, the transposon Tourist -OsaCatA wasinserted in the vicinity of a gene which is to be expressed in a flower.

Hereinafter, the present invention will be described by way of examples.The examples are only for purposes of illustration. Therefore, theclaims of the present invention are limited only by the claims, but notthe examples.

EXAMPLES

Techniques used in the examples below are well known in the art anddescribed in, for example, Sambrook et al., Molecular.Cloning: ALaboratory Manual (2nd Ed.), Vol. 1 to 3, Cold Spring Harbor Laboratory,(1989). In the examples below, Tourist-OsaCatA is used as an example ofa transposon sequence. The present invention is not limited to thespecific sequence, and can be easily modified by those skilled in theart reviewing the present specification.

Example 1 Comparison of Base Sequences

A similarity search was conducted in the DNA databases DDBJ (Rel. 35),GenBank (Rel. 111), and EMBL (Rel. 58) accessed from the web site of theDNA bank of the National Institute of Agrobiological Resources(http://www.DNA.affrc.go.jp) using the BLAST algorithm (Altschul et al.,Nucl. Acids Res. 25: 3389-3402 (1997)). Base sequences highly similar toTourist-OsaCatA (score value of at least 40) were compared with the basesequence of Tourist-OsaCatA using genetic information processingsoftware GENETYX-MAC (Software Kaihatsu K.K., Tokyo).

As a result, 228 BACs (bacterial artificial chromosome) clones and 11PACs (P1-derived artificial chromosome) clones were detected. Amongthem, 32 Tourist-OsaCatA like sequences were confirmed to be present inthe vicinity of CDS in a BAC/PAC clone in which information about aregion (CDS) inferred to code a gene is stored (Table 1).

Example 2 RT-PCR

Total RNAs of rice (variety: Nipponbare) were prepared from blades,roots, glumose flowers, and immature seeds using RNeasy (Qiagen, Hilden,Germany). The preparation was carried out in accordance with a manualattached to RNeasy. The prepared total RNAs-were treated with DNase I(Life Technologies, Rockville, Md. USA), followed by RT-PCR. RT-PCR wascarried out using Superscript One-Step RT-PCR system (LifeTechnologies). 50 ng of the total RNAs were used as a template. cDNAsynthesis was carried out at 47° C. for 40 minutes, and then at 94° C.for 2 minutes. Thereafter, 24 to 40 cycles of reactions at 94° C. for 2minutes, at 52° C. for 2 minutes, and at 72° C. for 2 minutes arerepeated. The resultant amplified DNA fragments were observed by agaroseelectrophoresis and further base sequence analysis.

As a result, expression was detected in one or a plurality of organs for8 genes (including CatA) of 12 genes having Tourist C element(classified by Bureau and Wessler, Proc Natl Acad Sci USA 91: 1411-1415(1994)) having a similar structure to that of CatA and Tourist-OsaCatA(FIG. 2), 2 ESTs of 3 ESTs having a highly similar sequence toTourist-OsaCatA in DNA registered as EST (FIG. 3), and 11 CDSs of 29CDSs present in the vicinity of Tourist-OsaCatA like sequence (Table 2).All of the genes, ESTs, and CDSs having detected expression were alsoexpressed in a flower.

TABLE 2 CDS used in RT-PCR analysis (putative protein coding region)DNA- mRNA- derived derived Accession product product CDS No.Location^(a) (bp) (bp) Putative protein^(b) 1 AB023482 24114 . . . 28262552 552 AP2 domain containing protein 2 AB023482 80821 . . . 79730C 318297 RING-H2 finger protein 3 AB023482 96980 . . . 98055 387 211homocitrate synthase 4 AB023482 116583 . . . 115633C 803 287 ND 5AB023482 147866 . . . 146584C 466 185 Pro-rich protein 6 AB026295 55634. . . 60562 1299 419 ND 7 AJ243961 20178 . . . 21866 1230 399 ND 8AJ243961 32999 . . . 30355C 2024 413 ND 9 AJ245900 18594 . . . 13572C802 161 small Gln-rich tetratricopeptide repeat-containing protein 10AJ245900 55548 . . . 47810C 366 279 Ser/Thr kinase 11 AJ245900 84671 . .. 85891 506 425 peroxidase-like protein 12 AP000367 97928 . . . 92382C374 188 citrate synthetase 13 AP000391 57914 . . . 61168 2389 379 ND 14AP000399 139582 . . . 138524C 281 237 ND 15 AP000559 40 . . . 4073 665206 ND 16 AP000559 7961 . . . 14114 837 248 protein kinase 17 AP00055978936 . . . 76828C 192 192 Arg decarboxylase 18 AP000570 8488 . . . 9016492 389 ND 19 AP000570 11735 . . . 9270C 2443 280 ND 20 AP000570 26400 .. . 28073 579 208 ND 21 AP000570 29495 . . . 31839 1768 146 ND 22AP000570 37710 . . . 40502 392 156 ND 23 AP000570 53424 . . . 50022C 411199 syntaxin related protein 24 AP000570 112576 . . . 109828C 283 283 ND25 AP000836 26019 . . . 28679 338 254 ND 26 AP000836 98168 . . . 95361C527 159 ribosomal protein L30 27 AP000836 156814 . . . 158260 254 215 ND28 AP000836 186228 . . . 183159C 1792 220 ND 29 AP000837 121445 . . .124105 277 193 ND ^(a)Locations of Tourist-OsaCatA like sequences in abase sequence registered in DDBJ/EMBL/GenBank databases. C indicates acomplementary strand. ^(b)ND indicates that a sequence did not have asignificant similarity to the genes registered in a database.

Example 3 Hybridization

Total DNAs of rice (variety: Nipponbare) were prepared from blades inaccordance with the method of Murray and Thompson (Nucl. Acids Res. 8:4321-4325 (1980)). The prepared total DNAs were digested withHindIII•XhoI, EcoRV•HindIII, BamHI, or EcoRI. The digested DNAs wereseparated by 1% agarose gel electrophoresis, and then transferred to anylon membrane. Southern hybridization was carried out using DNAfragments containing Tourist-OsaCatA as a probe, where the hybridizationwas carried out in a hybridization solution containing 50% formamide,5×SSC, 1× Denhardt's solution, 1 mM EDTA (pH 8.0), 0.1% SDS, and 0.1mg/ml salmon sperm DNA, at 42° C. for 1 day. 32P labeling of a probe wascarried out in accordance with Feinberg and Vogelstein method (Anal.Biochem. 132: 6-13 (1983)). Further, washing of a membrane was carriedout in 2×SSC, 0.5% SDS at 60° C. for 60 minutes (low stringency) andthen 0.1×SSC, 0.5% SDS for 65° C. for 60 minutes (high stringency).

Southern hybridization analysis detected several strong signal bands anda number of weak signal bands. It is believed that the strong signalbands correspond to DNA fragments containing a base sequence having ahigh similarity to Tourist-OsaCatA, and the weak signal bands correspondto DNA fragments containing a base sequence having a relatively lowsimilarity to Tourist-OsaCatA (FIG. 5).

Example 4 Expression in each Portion of a Flower

Total RNAs of each site in the glumose flower of rice (variety:Nipponbare) were prepared from stamens, pistils, lemmas•palea, orglumose flower bases (rachilla, glume, rudimentary glume, lodicule)using RNeasy (Qiagen). The preparation was carried out in accordancewith the manual attached to RNeasy. The prepared total RNAs were treatedwith DNase I (Life Technologies), followed by RT-PCR. RT-PCR was carriedout using Superscript One-Step RT-PCR system (Life Technologies). 50 ngof the total RNAs were used as a template. cDNA synthesis was carriedout at 47° C. for 40 minutes, and then at 94° C. for 2 minutes.Thereafter, 27 to 40 cycles of reactions at 94° C. for 2 minutes, at 52°C. for 2 minutes, and at 72° C. for 2 minutes are repeated. Theresultant amplified DNA fragments were observed by agaroseelectrophoresis.

As a result, for 7 genes of the 8 genes shown in FIG. 2 and the 2 ESTshaving detected expression in FIG. 3, expression was detected in astamen or a pistil (FIG. 7A). For the A1 gene, amplification of aplurality of bands having different lengths was observed, andamplification of a DNA fragment having an intended length wassuppressed. Further, for 10 CDSs of the 11 CDSs shown in FIG. 4,expression was detected in a stamen or a pistil (FIG. 7B). For CDS3,formation of a primer dimer was often observed, and amplification of aDNA fragment having an intended length was suppressed.

Example 5 Comparison in Homology

A base sequence (115 bps; SEQ ID NO: 2) at a middle portion ofTourist-OsaCatA was compared with a region corresponding to 12Tourist-OsaCatA like sequences (SEQ ID NOs: 26 to 37) in the vicinity ofCDS having detected expression revealed by RT-PCR, exhibiting 65.8 to90.4% (average: 82.9%) homology (FIG. 6).

Publications, references or patent applications cited herein areincorporated by reference in entirety.

The above-described invention is described by way of illustration indetail to some extent. The examples are described for the purpose ofhelping understand the present invention. It will be understood by thoseskilled in the art from the teaching of the present invention that theexamples may be particularly changed and modified without departing fromthe gist or spirit of the claims attached hereto.

INDUSTRIAL APPLICABILITY

The transposon sequence of the present invention (e.g., a MITE sequence(e.g., a Tourist sequence) (e.g., a base sequence of a Tourist-typetransposable element (Tourist-OsaCatA) found in the promoter region ofthe rice CatA gene)) can be used to efficiently screen for genes whichare to be expressed in a site containing a flower. Therefore, thesegenes are useful to develop a promoter specific to an anther and pollenfor breeding rice and the like by genetically modifying a gene relatingto an anther or pollen constituting a flower, or modifying components ofeach tissue of a flower.

1. A method for detecting a gene which is expressed in a flower andother organs in a rice plant, comprising the steps of: (1) searching agene population using a Tourist C transposon sequence consisting of SEQID NO: 1 as a key sequence, (2) selecting a gene having the transposonsequence in the vicinity of a putative protein coding region, and (3)detecting expression of said gene in the flower and other organs.
 2. Themethod according to claim 1, wherein the expression of said geneincludes expression of at least one site selected from a stamen and apistil.
 3. The method according to claim 1, wherein the gene populationis a library and the key sequence is a probe sequence.
 4. The methodaccording to claim 3, wherein the database is a DNA library.
 5. Themethod according to claim 3, wherein the search is carried out by asearch method selected from the group consisting of stringenthybridization, microarray assay, POR, and in situ hybridization.
 6. Themethod according to claim 1, wherein the vicinity of the putativeprotein coding region is within about 2 kbp downstream of a translationtermination codon, and within an intron.
 7. A method for inferring anorgan of a rice plant containing a flower and other organs in which agene is expressed, comprising the step of: (1) obtaining informationabout whether or not a base sequence of a Tourist C transposable elementsequence consisting of SEQ ID NO: 1 is present in the vicinity of thegene and when the sequence is present in the vicinity of the gene,inferring that the gene is expressed in the plant organ containing aflower relating to the Tourist C transposable element sequence.
 8. Themethod according to claim 7, wherein the organ containing a flowercontains a site selected from the group consisting of a stamen and apistil.
 9. A method for modifying an expression pattern of a gene of aplant, comprising the step of utilizing the sequence of a gene obtainedby a method according to claim 7.